---
output:
  pdf_document:
    citation_package: natbib
    keep_tex: false
    fig_caption: true
    toc: true 
    latex_engine: pdflatex
    template: header.tex
title: "Supplementary Material -- A Double Standard? Gender Bias in Voters' Perceptions of Political Arguments"
#author:
#  - name: Lotte Hargrave
 #   affiliation: University College London
geometry: margin=1in
fontsize: 12pt
bibliography: PhD.bib
biblio-style: apsr
#thanks: "**This version**: `r format(Sys.time(), '%B %d, %Y')`." 
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE, fig.pos= "h")
library(data.table)
library(knitr)
library(kableExtra)
library(xtable)
library(ggplot2)
library(corrplot)
library(scales)
library(stargazer)
library(sjPlot)
library(sandwich)

load("data/style_data.Rdata")
load("data/emotion_data.Rdata")
load("data/aggression_data.Rdata")
load("data/evidence_data.Rdata")

```

\setcounter{page}{1}
\setcounter{figure}{0}
\setcounter{table}{0}
\setcounter{equation}{0}
\renewcommand{\thepage}{S\arabic{page}}
\renewcommand{\thesection}{S\arabic{section}}
\renewcommand{\thetable}{S\arabic{table}}
\renewcommand{\thefigure}{S\arabic{figure}}
\renewcommand{\theequation}{S\arabic{equation}}

\doublespacing
\thispagestyle{empty}
\newpage

# Experiment pre-test survey 

Prior to fielding the survey experiment, a pre-test survey was fielded by Prolific to 1,499 members of the UK online panel in August 2021. Speeches were pre-tested without the MP's gender stated, and there were 3 x 2 x 2 (style x style treatment status x policy area) = 18 speeches tested overall. 

Following an introduction screen describing the task, respondents were randomly assigned to read one speech and asked to answer several questions about the speech and the MP. First, depending on the style type assigned, respondents were asked whether they agreed or disagreed that the speech was "emotional", "aggressive" or "evidence-based". The purpose of this was to ensure that the treatment texts were representative of the style types, and that the control texts were not representative of the style types. An example of the task and question can be seen in figure \ref{img:pretesting}. 

Results from the style pre-testing exercise are presented in table \ref{tab:pre-test}. Responses are presented as the extent to which respondents (strongly) disagree, neither agree nor disagree,or (strongly) agree that the speech was "emotional", "aggressive" or "evidence-based". Figures \ref{fig:emotion}-\ref{fig:evidence} show the responses collapsed by style, style prevalence, and policy area. 

\footnotesize
\singlespacing

|Style type|(Strongly) disagree| Neither agree nor disagree |(Strongly) agree|
|:--------|:-------|:-----------------|:-------|
|Emotion control|60% | 21.7% | 18.3% | 
|Emotion treatment|10.1% | 17.4% | 72.5% |
|Aggression control| 65.2% | 10.4% | 24.4% |
|Aggression treatment| 19.9% | 12.4% | 67.7% |
|Evidence statistics| 6.4%| 12.8% | 80.8% |
|Evidence anecdote| 28.8% | 14.4% | 56.8% |
Table: Results from treatment pre-testing\label{tab:pre-test}

\normalsize
\doublespacing

![Example of pre-testing task (Aggression, treatment, health)\label{img:pretesting}](figure_S1_pretesting_prompt.jpg)

\afterpage{

\vspace*{\fill}
\begin{figure}
\includegraphics{analysis/plots/figure_S2_emotion_pretesting.pdf}
\caption{Distribution of responses across treatment group status for emotion}
\label{fig:emotion}
\end{figure}
\vfill
}

\afterpage{

\vspace*{\fill}
\begin{figure}
\includegraphics{analysis/plots/figure_S3_aggression_pretesting.pdf}
\caption{Distribution of responses across treatment group status for aggression}
\label{fig:aggression}
\end{figure}
\vfill

}

\afterpage{

\vspace*{\fill}
\begin{figure}
\includegraphics{analysis/plots/figure_S4_evidence_pretesting.pdf}
\caption{Distribution of responses across treatment group status for evidence}
\label{fig:evidence}
\end{figure}
\vfill

}

\newpage

Overall, the results from the style pre-testing exercise confirm that the treatments are perceived as representative of the styles. For instance, 72.5% of respondents assigned to the emotion treatment conditions either agreed or strongly agreed that they were emotional. Similarly, 67.7% of respondents assigned the aggression treatment conditions agree or strongly agreed that they were aggressive. Further, the pre-testing results provide good evidence that the control texts were not perceived to be particularly representative of the styles. For instance, 60% of respondents assigned to the emotion control conditions either disagreed or strongly disagreed that they were emotional. Additionally, 65.2% of the respondents assigned to the aggression control conditions either disagreed or strongly disagreed that they were aggressive. While the majority of respondents perceived that both the statistics and anecdote speeches were evidence-based, a larger percentage of respondents perceived the statistical speeches as evidence based than the anecdotal speeches (80.8% and 56.8% respectively). 

Further, the results are consistent across the policy areas for each style. For example, for the emotion controls, between 45%--50% of respondents "disagreed" that the arguments were emotional across the three policy areas, and roughly 10% of respondents "strongly disagreed" that the arguments were emotional across the three policy areas. Similarly, for the aggression treatments between roughly 45%--50% of respondents "agreed" that the arguments were aggressive. 

Second, to ensure that treatment texts were not too long for respondents to read and understand, all respondents were asked the follow-up question "Did you find this text too long and/or too complicated to understand?". Overall, across all 18 treatments, 95.7% said that the speeches were not too long and/or complicated to understand, and only 4.3% said they were. Overall, there is good evidence that the style treatments work as planned, and that the treatment texts are not too long for respondents to read and understand. 

\newpage

# Treatment texts 

In full set of gendered names used in the experiment are displayed in table \ref{tab:gendered_names}. 

\singlespacing

|Man forename | Woman forename | Surname | 
|:-----|:-----|:-----|
|Adam | Beth | Craddock |
|Jack | Charlotte | Jones |
|Peter | Lucy | Richards |
Table: Gendered names for experimental design\label{tab:gendered_names}

\doublespacing

In table \ref{tab:style_definitions} I include a definition of each of the styles of interest, which are drawn from the gender stereotypes literature. 

\singlespacing

\begin{longtable}{{l}p{0.65\textwidth}}
\caption{Style type definitions}
\\
\toprule
Style & Definition \\\midrule
Emotion & Positive emotional language, which includes expressing empathy, praise, celebration, congratulations, hope, and joy \\
& \\
Aggression & Language that relates to conflict, political point-scoring, criticisms, or insults \\ 
& \\
Statistical evidence & Use of numbers, statistics, numeric quantifiers, figures, and empirical evidence as the basis for an MP's argument \\
&  \\
Anecdotal evidence & Use of personal examples or experiences, stories of other people, constituency stories, or illustrative examples as the basis of an MP's argument \\
\bottomrule
\label{tab:style_definitions}
\end{longtable}

\doublespacing

In table \ref{tab:treatment_texts}, I present the full text of each of the speeches. The "treatment" for each of the styles are indicated in square brackets and in bold font. 

\blandscape 
\footnotesize
\singlespacing

\begin{longtable}{{l}p{0.90\textwidth}}
\caption{Treatment texts}
\\
\toprule
Treatment number & Treatment text \\\midrule
Treatment 1: Emotion, Control, Housing & Our housing market has not been as it should for years. It is the job of all of us to increase the supply of housing available. But for many young people the gap between wages and house prices is too wide for homeownership to be viable any time soon. Many people live in unstable rented housing who may be driven out by increasing rent costs. Work should be done to help people move out of rented accommodation and become homeowners. We need different strategies to help to increase supply and make homes more affordable. We should build new homes, and we should repurpose empty homes. Builders, investors, and local councils will need to work together for change to occur. Our housing market does not work for many people. There is a need for new policy to help home ownership become realistic for young people all around the country. \\
& \\
Treatment 2: Emotion, Treatment, Housing & \textbf{[The idea of owning your own home is one filled with such excitement for people.}] It is the job of all of us to increase the supply of housing. \textbf{[I recall with such warmth that sense of euphoria, exhilaration, and real achievement when I held the keys to my first home. I can, hand on my heart, say that it was one of the proudest days of my life.]} But for many young people the gap between wages and house prices is too wide for homeownership to be viable any time soon. We need different strategies to help to increase supply and make homes more affordable. \textbf{[I am hopeful that, if we work together, we can help people get closer to experiencing the joy of homeownership themselves. Changing the lives of the people I represent to help reach brighter futures is, after all, really what makes my job such a pleasure to do.]} \\
& \\
Treatment 3: Aggression, Control, Housing & Our housing market has not been as it should for years. It is the job of all of us to increase the supply of housing available. But for many young people the gap between wages and house prices is too wide for homeownership to be viable any time soon. Many people live in unstable rented housing who may be driven out by increasing rent costs. Work should be done to help people move out of rented accommodation and become homeowners. We need different strategies to help to increase supply and make homes more affordable. We should build new homes, and we should repurpose empty homes. Builders, investors, and local councils will need to work together for change to occur. Our housing market does not work for many people. There is a need for new policy to help home ownership become realistic for young people all around the country. \\
& \\
& \\
& \\
\midrule
Treatment number & Treatment text \\\midrule
Treatment 4: Aggression, Treatment, Housing & \textbf{[I have to start by saying I utterly disagree with what you've just said. You clearly lack any understanding of how serious this is, and, frankly, you show absolutely no care for the people we represent.]} It is the job of all of us to increase the supply of housing. But for many young people the gap between wages and house prices is too wide for homeownership to be viable any time soon. \textbf{[Too many people live in insecure rented houses owned by exploitative, greedy, penny-pinching landlords who milk their tenants for all they're worth.]} We need different strategies to help to increase supply and make homes more affordable. \textbf{[Young people have been utterly abandoned and left fearful about their futures because we have catastrophically failed to do enough to help them. Things must change, and anyone who opposes this cannot claim they care about the well-being of our people.]} \\
& \\
Treatment 5: Evidence, Statistics, Housing & Our housing market has not been as it should for years. It is the job of all of us to increase the supply of housing. But for many young people the gap between wages and house prices is too wide for homeownership to be viable any time soon. \textbf{[Those in their mid-30s to mid-40s are three times more likely to be renters than 20 years ago. People in the early 1990s could expect to pay 3.5 times their annual earnings on buying a home, but this has risen to 7.8 today. In 2019, the average property sold for £235,300, meanwhile the average pay came in at £29,000.]} Our housing market does not work for many people. We need different strategies to help to increase supply and make homes more affordable. There is a need for new policy to help homeownership become realistic for young people all around the country.  \\
& \\
Treatment 6: Evidence, Anecdote, Housing & Our housing market has not been as it should for years. It is the job of all of us to increase the supply of housing. But for many young people the gap between wages and house prices is too wide for homeownership to be viable any time soon. \textbf{[I spoke recently to Eleanor and Michael, a young couple in my constituency, who have been renting together for a few years and want to buy their first home. They told me they are extremely concerned that their dream of homeownership simply will not be one they can reach unless something changes.]} Our housing market does not work for many people. We need different strategies to help to increase supply and make homes more affordable. There is a need for new policy to help homeownership become realistic for young people, \textbf{[just like Eleanor and Michael]}, all around the country. \\
& \\
& \\
& \\
& \\
& \\
& \\
\midrule
Treatment number & Treatment text \\\midrule
Treatment 7: Emotion, Control, Health & The way we understand diseases and how to treat them has grown over the years. There have been advances in treatments that might help save the lives of people in this country and we should be committed to using them. But in some cases, patients are left without access to the drugs they need because of the costs of treatment. Patients being able to access the treatments they require is one of the principles of providing universal health care to everyone who needs it. I think some people find it challenging to accept that questions of money should enter decisions about health care, but this is the situation that many hospitals, doctors, and patients find themselves in. It is important that how we fund and pay for drugs works for everybody. Change should happen to make sure that both medical services and patients get the treatments and resources they need. \\
& \\
Treatment 8: Emotion, Treatment, Health & The way we understand diseases and how to treat them has grown over the years. There have been advances in treatments that might help save the lives of people in this country and we should be committed to using them. \textbf{[These advances are miraculous. I am brought to tears when I hear heartening stories about young children's lives being saved and getting to see the joy and relief of families getting to spend more time with their loved ones. Being with the ones we love is really the only thing that matters.]} But in some cases, patients are left without access to the drugs they need because of the costs of treatment. It is important that how we fund and pay for drugs works for everybody. \textbf{[This isn't only about improving our country for the better, but about ensuring that we can be hopeful about a future of humanity that shines bright.]} \\
& \\
Treatment 9: Aggression, Control, Health & The way we understand diseases and how to treat them has grown over the years. There have been advances in treatments that might help save the lives of people in this country and we should be committed to using them. But in some cases, patients are left without access to the drugs they need because of the costs of treatment. Patients being able to access the treatments they require is one of the principles of providing universal health care to everyone who needs it. I think some people find it challenging to accept that questions of money should enter decisions about health care, but this is the situation that many hospitals, doctors, and patients find themselves in. It is important that how we fund and pay for drugs works for everybody. Change should happen to make sure that both medical services and patients get the treatments and resources they need. \\
& \\
& \\
& \\
\midrule
Treatment number & Treatment text \\\midrule
Treatment 10: Aggression, Treatment, Health & \textbf{[I have to start by saying I utterly disagree with what you've just said. You clearly lack any understanding of how serious this is, and, frankly, you show absolutely no care for the people we represent.]} The way we understand diseases and how to treat them has grown over the years. There have been advances in treatments that might help save the lives of people in this country and we should be committed to using them. \textbf{[But often nothing is done because it is simply deemed not worth the money. I am utterly revolted by the idea that people are not getting treated merely because of some appalling cost-benefit calculation. This is disgusting, deplorable, and inhuman.]} It is important that how we fund and pay for drugs works for everybody. \textbf{[Things must change, and anyone who opposes this cannot claim they care about the well-being of our people.]} \\
& \\
Treatment 11: Evidence, Statistics, Health & The way we understand diseases and how to treat them has grown over the years. But in some cases, patients are left without access to the drugs they need because of the costs of treatment. I think some people find it challenging to accept that questions of money should enter decisions about health care, but this is the situation that many hospitals, doctors, and patients find themselves in. \textbf{[The National Health Service has a fixed budget of only around £110 billion a year in England.]} It is important that how we fund and pay for drugs works for everybody. \textbf{[For example, between 2008 and 2016, one drug increased in price by 12,000\%. This sort of increase isn't sustainable, and if the price had only stayed the same, the NHS could have spent £58 million less.]} Change should happen to make sure that both medical services and patients get the treatments and resources they need. \\ 
& \\
Treatment 12: Evidence, Anecdote, Health & The way we understand diseases and how to treat them has grown over the years. \textbf{[I spoke recently to Eleanor and Michael, a young couple in my constituency, and they told me the story of how Michael was able to get the treatment he needed to be declared cancer free and for his life to be saved.]} But in other cases, patients are left without access to the drugs they need because of the costs of treatment. I think some people find it challenging to accept that questions of money should enter decisions about health care, but this is the situation that many hospitals, doctors, and patients find themselves in. It is important that how we fund and pay for drugs works for everybody, \textbf{[just like it did for Eleanor and Michael]}. Change should happen to make sure that both medical services and patients get the treatments and resources they need. \\
& \\
& \\
\midrule
Treatment number & Treatment text \\\midrule
Treatment 13: Emotion, Control, Transport & Transport is the largest carbon-emitting sector of the UK economy, and, within this, cars contribute most. Air pollution increases the risk of heart disease, cancer, diabetes, and asthma attacks. Electric vehicles offer one method of reducing emissions as they produce no air pollution. We should aim for all new vehicles made to be run either partially or wholly on electricity. Electric vehicles offer clear benefits for improving local air quality as they produce no exhaust emissions at the street level. The market for electric vehicles is small yet growing. We can be industry leaders in how we produce and use electric vehicles. A variety of different strategies should be employed to encourage their uptake. We should widen accessibility in the use of electric vehicles to make them more practical for those living in urban or built-up areas. For instance, by making sure there is sufficient charging infrastructure. If we use electric vehicles, journeys can be greener and safer. \\
& \\
Treatment 14: Emotion, Treatment, Transport & Transport is the largest carbon-emitting sector of the UK economy, and, within this, cars contribute most. Air pollution increases the risk of heart disease, cancer, diabetes, and asthma attacks. Electric vehicles offer one method of reducing emissions as they produce no air pollution. \textbf{[Imagine stepping outside on a bright, beautiful morning and hearing not engines revving nor choking on polluted air but feeling that simple joy of hearing the birds sing and that rush of fresh air into your lungs. Doesn't this sound amazing? This could be our future, and I so hope that this doesn't have to be a distant dream.]} We should widen accessibility in the use of electric vehicles to make them more practical for those living in urban or built-up areas.  If we use electric vehicles, journeys can be greener and safer. \textbf{[We can fill our lives with the simple pleasures of bird song in our ears, fresh air in our lungs, and blue skies ahead.]} \\
& \\
Treatment 15: Aggression, Control, Transport & Transport is the largest carbon-emitting sector of the UK economy, and, within this, cars contribute most. Air pollution increases the risk of heart disease, cancer, diabetes, and asthma attacks. Electric vehicles offer one method of reducing emissions as they produce no air pollution. We should aim for all new vehicles made to be run either partially or wholly on electricity. Electric vehicles offer clear benefits for improving local air quality as they produce no exhaust emissions at the street level. The market for electric vehicles is small yet growing. We can be industry leaders in how we produce and use electric vehicles. A variety of different strategies should be employed to encourage their uptake. We should widen accessibility in the use of electric vehicles to make them more practical for those living in urban or built-up areas. For instance, by making sure there is sufficient charging infrastructure. If we use electric vehicles, journeys can be greener and safer.\\
& \\
\midrule
Treatment number & Treatment text \\\midrule
Treatment 16: Aggression, Treatment, Transport & \textbf{[I have to start by saying I utterly disagree with what you've just said. You clearly lack any understanding of how serious this is, and, frankly, you show absolutely no care for the people we represent.]} Transport is the largest carbon-emitting sector of the UK economy, and, within this, cars contribute most. Air pollution increases the risk of heart disease, cancer, diabetes, and asthma attacks. Electric vehicles offer one method of reducing emissions as they produce no air pollution. \textbf{[People are choking to death because we are failing to clean up the toxic air we breathe. It is shameful and nothing has been done. Why?]} We should widen accessibility in the use of electric vehicles to make them more practical for those living in urban or built-up areas. If we use electric vehicles, journeys can be greener and safer. \textbf{[Things must change, and anyone who opposes this cannot claim they care about the well-being of our people.]} \\ 
& \\
Treatment 17: Evidence, Statistics, Transport & Transport is the largest carbon-emitting sector of the UK economy, and, within this, cars contribute most. \textbf{[Last year, the transport sector accounted for 29.8\% of total carbon dioxide emissions.]} Air pollution increases the risk of heart disease, cancer, diabetes, and asthma attacks. \textbf{[In London alone, there are 9,400 premature deaths every year because of poor air quality.]} Electric vehicles offer one method of reducing emissions as they produce no air pollution. The market for electric vehicles is small yet growing. \textbf{[Last year saw the biggest ever annual increase in electric vehicles, with a growth of 175,000 new vehicles, which was up 66\% on 2019.]} We can be industry leaders in how we produce and use electric vehicles. We should widen accessibility in the use of electric vehicles to make them more practical for those living in urban or built-up areas. If we use electric vehicles, journeys can be greener and safer.  \\
& \\
Treatment 18: Evidence, Anecdote, Transport & Transport is the largest carbon-emitting sector of the UK economy, and, within this, cars contribute most. Air pollution increases the risk of heart disease, cancer, diabetes, and asthma attacks. Electric vehicles offer one method of reducing emissions as they produce no air pollution. The market for electric vehicles is small yet growing. We can be industry leaders in how we produce and use electric vehicles. \textbf{[I spoke recently to Eleanor and Michael, a young couple in my constituency who live in a flat in a high-rise building. They told me while they want to make the swap to an electric vehicle, it is just not practical as they do not have easy access to a charging point.]} We should widen accessibility in the use of electric vehicles to make them more practical for people, \textbf{[just like Eleanor and Michael]}, who live living in urban or built-up areas. If we use electric vehicles, journeys can be greener and safer. \\
\\
\bottomrule
\label{tab:treatment_texts}
\end{longtable}


\normalsize
\doublespacing
\elandscape 


\newpage

# Full models 

In this section, I present the full regression results from the analysis presented in the main body of the paper. 

## Unconditional effects

First, I the results from the analysis investigating the unconditional effects between style usage and style perceptions, likeability evaluations, and competence evaluations in tables \ref{tab:unconditional_perceptions}--\ref{tab:unconditional_competence}.

\bigskip 

```{r, results = 'asis'}

emotion_perception <- lm(perceived_emotion ~ objective_style_emotion, data = emotion)
aggression_perception <- lm(perceived_aggression ~ objective_style_aggression, data = aggression)
evidence_perception <- lm(perceived_evidence ~ prevalence_evidence, data = evidence)

stargazer(emotion_perception,aggression_perception, evidence_perception,
          header=FALSE, 
          no.space=TRUE, keep.stat=c("n","rsq","adj.rsq"), 
          covariate.labels = c("Intercept", "Emotional", "Aggressive", "Anecdotal"), 
          dep.var.labels = c("Perceived emotion", "Perceived aggression", "Perceived evidence"), 
          font.size="footnotesize",
          label = "tab:unconditional_perceptions",
          column.sep.width = "-5pt", 
          intercept.bottom=FALSE, 
          title="Relationship between style usage and style perceptions")

```

```{r, results = 'asis'}

emotion_likeability <- lm(perceived_likeability ~ objective_style_emotion, data = emotion)
aggression_likeability <- lm(perceived_likeability ~ objective_style_aggression, data = aggression)
evidence_likeability <- lm(perceived_likeability ~ prevalence_evidence, data = evidence)

stargazer(emotion_likeability,aggression_likeability, evidence_likeability, 
          header=FALSE, 
          no.space=TRUE, keep.stat=c("n","rsq","adj.rsq"), 
          covariate.labels = c("Intercept", "Emotional", "Aggressive", "Anecdotal"), 
          dep.var.labels = c("Likeability"), 
          column.labels = c("Emotion", "Aggression", "Evidence"),
          font.size="footnotesize",
          label = "tab:unconditional_likeability ",
          intercept.bottom=FALSE, 
          title="Relationship between style usage and likeability evaluations")

```

```{r, results = 'asis'}

emotion_competence <- lm(perceived_competence ~ objective_style_emotion, data = emotion)
aggression_competence <- lm(perceived_competence ~ objective_style_aggression, data = aggression)
evidence_competence <- lm(perceived_competence ~ prevalence_evidence, data = evidence)


stargazer(emotion_competence,aggression_competence, evidence_competence, 
          header=FALSE, 
          no.space=TRUE, keep.stat=c("n","rsq","adj.rsq"), 
          covariate.labels = c("Intercept", "Emotional", "Aggressive", "Anecdotal"), 
          dep.var.labels = c("Competence"), 
          column.labels = c("Emotion", "Aggression", "Evidence"),
          font.size="footnotesize",
          label = "tab:unconditional_competence",
          intercept.bottom=FALSE, 
          title="Relationship between style usage and competence evaluations")

```

\newpage 

## Conditional effects by MP gender 

Second, I present the results from the analysis investigating the conditional effects by MP gender in tables \ref{tab:likeability_models}--\ref{tab:style_prevalence_models}. The output from model 1 in table \ref{tab:likeability_models} below shows the full regression output for the relationship between MP gender, MP-likeability evaluations, and additional control variables as displayed below for the emotion style, and model 2 shows the non-pre-registered regression output with the inclusion of additional controls for policy area. The output from model 3 shows the full regression output for the relationship between MP gender, MP-likeability evaluations, and additional control variables as displayed below for the aggression style, and model 4 shows the non-pre-registered regression output with the inclusion of additional controls for policy area. The output from model 5 shows the full regression output for the relationship between MP gender, MP-likeability evaluations, and additional control variables as displayed below for the evidence style, and model 6 shows the non-pre-registered regression output with the inclusion of additional controls for policy area. The treatment styles are the emotional style, aggressive style, and anecdotal evidence, the control styles are the non-emotional style, non-aggressive style, and statistical evidence.   

In all three models for the likeability outcomes, I have chosen to include respondent gender, age, and degree education as pre-treatment covariates, as, in expectation, they will increase the statistical power of the analysis by better explaining the variation in the likeability outcomes. For instance, a respondent's gender may explain how likeable they deem a politician to be, and, indeed, the results from the conditional analysis by respondent gender presented in tables \ref{tab:conditional_respondent_emotion_1}--\ref{tab:conditional_respondent_evidence_1} suggest that this is the case. Similarly, a respondent's age may provide a plausible explanation for why they would evaluate a politician delivering, say, an aggressive argument as more or less likeable. 

```{r, results='asis'}

emotion_likeability_covariates <- lm(perceived_likeability ~ mp_gender*objective_style +
                                              respondent_gender + age + degree_educated, data = emotion) 

emotion_likeability_policy <- lm(perceived_likeability ~ mp_gender*objective_style +
                                              respondent_gender + age + degree_educated + policy, data = emotion) 

aggression_likeability_covariates <- lm(perceived_likeability ~ mp_gender*objective_style +
                                                 respondent_gender + age + degree_educated,
                                               data = aggression)

aggression_likeability_policy <- lm(perceived_likeability ~ mp_gender*objective_style +
                                                 respondent_gender + age + degree_educated + policy,
                                               data = aggression)

evidence_likeability_covariates <- lm(perceived_likeability ~ mp_gender*objective_style + 
                                               respondent_gender + age + degree_educated, 
                                             data = evidence)

evidence_likeability_policy <- lm(perceived_likeability ~ mp_gender*objective_style + 
                                               respondent_gender + age + degree_educated + policy, 
                                             data = evidence)

stargazer(emotion_likeability_covariates, emotion_likeability_policy, aggression_likeability_covariates, aggression_likeability_policy, evidence_likeability_covariates, evidence_likeability_policy,
          header=FALSE, 
          no.space=TRUE, keep.stat=c("n","rsq","adj.rsq"), 
          covariate.labels = c("Intercept", "Woman MP", "Treatment Style", "Woman Voter", "Age", "Degree Educated", "Housing Policy", "Health Policy", "Woman MP*Treatment Style"), 
          dep.var.labels = c("Likeability"), 
          column.labels = c("Emotion", "Aggression", "Evidence"),
          column.separate = c(2,2,2),
          column.sep.width = "-5pt", 
          font.size="footnotesize",
          label = "tab:likeability_models",
          intercept.bottom=FALSE, 
          title="Relationship between MP gender and MP-likeability evaluations (treatment styles are emotional, aggressive, and anecdotal. Control styles are non-emotional, non-aggressive, and statistical)")
```

The output from model 1 in table \ref{tab:competence_models} below shows the full regression output for the relationship between MP gender, MP-competence evaluations, and additional control variables as displayed below for the emotion style, and model 2 shows the non-pre-registered regression output with the inclusion of additional controls for policy area. The output from model 3 shows the full regression output for the relationship between MP gender, MP-competence evaluations, and additional control variables as displayed below for the aggression style, model 4 shows the non-pre-registered regression output with the inclusion of additional controls for policy area. The output from model 5 shows the full regression output for the relationship between MP gender, MP-competence evaluations, and additional control variables as displayed below for the evidence style, and model 6 shows the non-pre-registered regression output with the inclusion of additional controls for policy area. The treatment styles are the emotional style, aggressive style, and anecdotal evidence, the control styles are the non-emotional style, non-aggressive style, and statistical evidence.   

As with the likeability outcomes, I have selected pre-treatment covariates that may plausibly explain variation in respondent's competency evaluations. These are respondent gender, age, degree education, and political attention. For instance, political attention has been included as a pre-treatment covariate as respondent's with higher levels of political attention may be better equipped to judge the competence of politicians delivering arguments on political issues than respondents with low political attention. Similarly, education has been included as a voter's education level may plausible explain how competent they deem politicians to be. 


```{r, results='asis'}

emotion_competence_covariates <- lm(perceived_competence ~ mp_gender*objective_style + 
                                             respondent_gender + age + degree_educated + political_attention,
                                           data = emotion)

emotion_competence_policy <- lm(perceived_competence ~ mp_gender*objective_style + 
                                             respondent_gender + age + degree_educated + political_attention + policy,
                                           data = emotion)

aggression_competence_covariates <- lm(perceived_competence ~ mp_gender*objective_style + 
                                                respondent_gender + age + degree_educated + political_attention,
                                              data = aggression)

aggression_competence_policy <- lm(perceived_competence ~ mp_gender*objective_style + 
                                                respondent_gender + age + degree_educated + political_attention + policy,
                                              data = aggression)

evidence_competence_covariates <- lm(perceived_competence ~ mp_gender*objective_style + 
                                              respondent_gender + age + degree_educated + political_attention,
                                            data = evidence)

evidence_competence_policy <- lm(perceived_competence ~ mp_gender*objective_style + 
                                              respondent_gender + age + degree_educated + political_attention + policy,
                                            data = evidence)

stargazer(emotion_competence_covariates, emotion_competence_policy, aggression_competence_covariates, aggression_competence_policy, evidence_competence_covariates, evidence_competence_policy, 
          header=FALSE, 
          no.space=TRUE, keep.stat=c("n","rsq","adj.rsq"), 
          covariate.labels = c("Intercept", "Woman MP", "Treatment Style", "Woman Voter", "Age", "Degree Educated", "Political Attention", "Housing Policy", "Health Policy", "Woman MP*Treatment Style"), 
          dep.var.labels = c("Competence"), 
          column.labels = c("Emotion", "Aggression", "Evidence"),
          column.separate = c(2, 2, 2), 
          column.sep.width = "-5pt", 
          font.size="footnotesize", 
          label = "tab:competence_models",
          title="Relationship between MP gender and MP-competence evaluations (treatment styles are emotional, aggressive, and anecdotal, control styles are non-emotional, non-aggressive, and statistical)", 
          intercept.bottom = FALSE)

```


The output from model 1 in table \ref{tab:style_prevalence_models} below shows the full regression output for the relationship between MP gender, perceived emotion, and additional control variables as displayed below for the emotion style, and model 2 shows the non-pre-registered regression output with the inclusion of additional controls for policy area. The output from model 3 shows the full regression output for the relationship between MP gender, perceived aggression, and additional control variables as displayed below for the aggression style, and model 4 shows the non-pre-registered regression output with the inclusion of additional controls for policy area. The output from model 5 shows the full regression output for the relationship between MP gender, perceived evidence, and additional control variables as displayed below for the evidence style, and model 6 shows the non-pre-registered regression output with the inclusion of additional controls for policy area. In all three models, the treatment styles are the emotional style, aggressive style, and anecdotal evidence. The control styles are the non-emotional style, the non-aggressive style, and statistical evidence. 

Finally, as with the above models, I have again included pre-treatment covariates that may plausibly explain respondents' style perceptions. For emotion and aggression, these are respondent gender, left-right placement, and age. For evidence, this is respondent gender, age, and degree education. Left-right placement has been included in the models for perceived emotion and aggression as work has suggested that left-wing ideology has historically been associated with greater emotionality than right-wing, and therefore voters' left-right positioning may explain variation in the emotion and aggression perceptions [e.g. @Salmela2018]. Similarly, I include degree education in the perceived evidence model, as a respondent's degree education may plausibly explain how they evaluate the extent to which they perceive anecdotal or statistical arguments to be evidence-based. 

```{r, results='asis'}

emotion_prevalence_covariates <- lm(perceived_emotion ~ mp_gender*objective_style +
                                             respondent_gender + left_right_placement + age,
                                           data = emotion)

emotion_prevalence_policy <- lm(perceived_emotion ~ mp_gender*objective_style +
                                             respondent_gender + left_right_placement + age + policy,
                                           data = emotion)

aggression_prevalence_covariates <- lm(perceived_aggression ~ mp_gender*objective_style + 
                                                respondent_gender + left_right_placement + age,
                                              data = aggression)

aggression_prevalence_policy <- lm(perceived_aggression ~ mp_gender*objective_style + 
                                                respondent_gender + left_right_placement + age + policy,
                                              data = aggression)

evidence_prevalence_covariates <- lm(perceived_evidence ~ mp_gender*objective_style + 
                                              respondent_gender + age + degree_educated, 
                                            data = evidence)

evidence_prevalence_policy <- lm(perceived_evidence ~ mp_gender*objective_style + 
                                              respondent_gender + age + degree_educated + policy, 
                                            data = evidence)

stargazer(emotion_prevalence_covariates, emotion_prevalence_policy, aggression_prevalence_covariates, aggression_prevalence_policy, evidence_prevalence_covariates, evidence_prevalence_policy, 
          header=FALSE, 
          no.space=TRUE, keep.stat=c("n","rsq","adj.rsq"), 
          covariate.labels = c("Intercept", "Woman MP", "Treatment Style", "Woman Voter", "Left-Right Placement", "Age", "Housing Policy", "Health Policy", "Degree Educated", "Woman MP*Treatment Style"), 
          dep.var.labels = c("Perceived emotion", "Perceived aggression", "Perceived evidence"), 
          column.separate = c(2,2,2),
          intercept.bottom = FALSE,
          column.sep.width = "-5pt", 
          font.size="footnotesize", 
          label = "tab:style_prevalence_models",
          title="Relationship between MP gender and style perceptions (treatment styles are emotional, aggressive, and anecdotal. Control styles are non-emotional, non-aggressive, and statistical)")

```

\newpage

# Multiple comparisons correction

In the empirical strategy carried out in the main body of the paper, there are numerous statistical tests are conducted, and there is risk of the multiple comparisons problem. To quote @Gelman2012[189-190], the multiple comparisons problem "is that the probability of a researcher wrongly concludes that there is at least one statistically significant effect across a set of tests, even when in fact there is nothing going on, increases with each additional test". In other words, when conducting a very large number of tests any one might be "significant" by chance alone, meaning that the $p$-values are unlikely to capture the true Type 1 error rate. Therefore, the probability of falsely rejecting a null hypothesis which is correct increases with each additional test. 

To assess whether the results I present in the main body are robust, I here carry out subsequent analyses with adjusted p-values which control for the False Discovery Rate using the Benjamini--Hochberg procedure [@Benjamini1995]. An alternative common approach is the Bonferroni correction. Under this approach, the $p$-value at which a test is evaluated is based on the total number of tests performed. Practically speaking, the $p$-value is calculated as the original $p$-value divided by the number of tests performed. However, the Bonferroni correction assumes independence between tests conducted which is clearly inappropriate in the case of this experimental design, where any form of gender-bias driving perceptions of women's use of, say, aggression is also likely to inform how respondents perceive and evaluate women's use of, say, statistical evidence. Further, by targeting the Type 1 error problem, the Bonferroni correction increases the likelihood of Type 2 errors. By changing the $p$-value needed to reject the null hypothesis, it increases the number of instances where the null is not rejected when it is in fact false and should have been. As @Gelman2012[192] argue: "the Bonferroni correction can severely reduce our power to detect an important effect". 

I instead opt to use the Benjamini-Hochberg [@Benjamini1995] procedure which is less stringent than the Bonferroni correction and is more appropriate in the context of this experimental design. In practice, when using the Benjamini–Hochberg procedure, an $\alpha$ level is selected, which here is 0.05. Next, the $p$-values of all hypotheses tests are ordered. Then I identify the largest $p$-value which satisfies the criteria $p_{k}\leq \frac{k}{m}\alpha$ where $k$ is the $p$-value's index. This test, and all tests with smaller $p$-values are declared significant. I apply this procedure to all of the unconditional and conditional models run in the main body of the text. The relevant $p$-values in the unconditional models are those that relate to the differences between treatment and control styles -- $\beta_1$. The relevant $p$-values in the conditional models are those that relate to the differences in the effect between treatment and control styles for men and women MPs -- $\beta_3$. 

Table \ref{tab:multiple_comparisons} shows the results from this procedure where I report the unadjusted and Benjamini--Hochberg adjusted $p$-values for the coefficients described above, whether they are significant or not, and whether the correction changes the significance. As is clear from the table, the results presented in the main body of the paper are robust to the correction in all instances except for the model showing the unconditional relationship between MP's use of evidence and competence evaluations (unconditional evidence competence). After the multiple comparisons correction is applied, this coefficient is no longer statistically significant. 

```{r, results = 'asis'}

load("data/corrected_pvalues.Rdata")
options(scipen=999)

adjusted_unadjusted$uncorrected_pvalues <- round(adjusted_unadjusted$uncorrected_pvalues, 2)
adjusted_unadjusted$corrected_pvalues <- round(adjusted_unadjusted$corrected_pvalues, 2)
adjusted_unadjusted$uncorrected_significant <- ifelse(adjusted_unadjusted$uncorrected_significant==TRUE, "Yes", "No")
adjusted_unadjusted$corrected_significant <- ifelse(adjusted_unadjusted$corrected_significant==TRUE, "Yes", "No")
adjusted_unadjusted$difference <- ifelse(adjusted_unadjusted$corrected_significant=="Yes" & adjusted_unadjusted$uncorrected_significant=="Yes" |                                                      adjusted_unadjusted$corrected_significant=="No" & adjusted_unadjusted$uncorrected_significant=="No", "No", "Yes")

out1 <- data.frame(`Model` = adjusted_unadjusted$model_names,
                   `Unadjusted p-values` = adjusted_unadjusted$uncorrected_pvalues, 
                   `Significant?` = adjusted_unadjusted$uncorrected_significant,
                   `Adjusted p-values` = adjusted_unadjusted$corrected_pvalues, 
                   `Significant?` = adjusted_unadjusted$corrected_significant, 
                   `Difference?` = adjusted_unadjusted$difference)

names(out1) <- c("Model name", "Unadjusted p-value", "Significant?", "Adjusted p-value", "Significant?", "Difference?")

kable(out1, caption = "Comparison between unadjusted and Benjamini-Hochberg adjusted p-values \\label{tab:multiple_comparisons}", 
      format = "latex", longtable = T) %>%
  kable_styling(full_width = F, font_size = 10) %>%
  column_spec(1, width = "6.3cm") %>%
  column_spec(c(2,4), width = "3.5cm") %>%
  column_spec(2:6, width = "2cm") %>%
  landscape()

options(scipen=0)

```

\newpage 

# Addditional non-pre-registered analyses 

## Pooled models 

In this section, I present the analysis of two sets of analysis where I pool the style types together. To do so, I first create a variable for whether a style is **female stereotype-congruent**, which takes the value of 0 for female stereotype-incongruent styles (non-emotional style, aggressive style, and statistical evidence) and 1 for female stereotype-congruent styles (emotional style, non-aggressive style, and anecdotal evidence). To assess how politicians using styles which are congruent with female stereotypes affecting voters' likeability and competence evaluations of them. To do so for the likeability outcomes, I pool all styles together and estimate the following: 
\begin{eqnarray}
Likeability_i{_(}{_j}{_)} = \alpha + \beta_1 FemaleStereotypeCongruent_{j} + \epsilon_{i}
\end{eqnarray}
\noindent where $\alpha$ describes the average likeability evaluations in female stereotype incongruent styles (non-emotional style, aggressive style, and statistical evidence), and $\alpha$ + $\beta_1$ describes the same quantities for female stereotype congruent styles (emotional style, non-aggressive style, and anecdotal evidence). 

I can, of course, identify the same quantity for the competence outcomes. I therefore also estimate: 
\begin{eqnarray}
Competence_i{_(}{_j}{_)} = \alpha + \beta_1 FemaleStereotypeCongruent_{j} + \epsilon_{i}
\end{eqnarray}
\noindent where $\alpha$ describes the average competence evaluations in female stereotype incongruent styles (non-emotional style, aggressive style, and statistical evidence), and $\alpha$ + $\beta_1$ describes the same quantities for female stereotype congruent styles (emotional style, non-aggressive style, and anecdotal evidence). As there are multiple observations per respondent, I cluster standard errors in both models at the respondent level. The results are presented in table \ref{tab:pooled_unconditional}. 

\bigskip

```{r, results = 'asis'}

load("analysis/tables/unconditional_pooled.Rdata")

stargazer(likeability_all_styles, competence_all_styles, 
          covariate.labels = c("Intercept", "Female Stereotype Congruent"), 
          dep.var.labels = c("Likeability", "Competence"), 
          intercept.bottom = FALSE, 
          header=FALSE, 
          no.space = TRUE, 
          keep.stat = c("n", "rsq", "adj.rsq"), 
          font.size = "footnotesize", 
          se = my_clustered_errors, 
          label="tab:pooled_unconditional",
          title = "Relationship between female stereotype congruent styles, likeability and competence evaluations")

```

In the left hand column, I show the results for the likeability outcomes. Here, the coefficient for female stereotype congruent (emotional style, non-aggressive style, and anecdotal evidence) is positive and significant, suggesting that politicians are evaluated as *more* likeable when they express styles which are congruent with "communal" stereotypes associated with women. This finding perhaps seems intuitive, given that communal stereotypes are associated with being warm, kind, emotional, and people-oriented [@Eagly2012; @Schneider2019]. 

In the right hand column, I show the results for the competence outcomes. Here, the coefficient for female stereotype congruent (emotional style, non-aggressive style, and anecdotal evidence) is negative and significant, suggesting that politicians are evaluated as *less* competent when they express styles which are congruent with "communal" stereotypes associated with women. That voters find politicians to be more competent when they express styles consistent with agentic stereotypes again seems to be intuitive finding given the compatibility between agentic stereotypes and leadership stereotypes [@Bauer2017a]. The takeaway from these findings, therefore, is that style usage represents a trade-off for politicians. While politicians gain in the likeability assessments when they use "communal" styles, they less out in their competence evaluations. 

Next, in the main analysis conditional analysis by MP gender presented in the main body of the paper I analyse each style separately. I here present the results from a non-pre-registered analysis where I pool all styles together and compare each treatment back to the control arguments. To do so, I first create a categorical variable for the **style type** of an argument, which takes the values of control (0), emotion (1), aggression (2), statistics (3), and anecdote (4). With this variable in hand, I estimate the following: 
\begin{eqnarray}\label{eq:conditional_likeability_pooled}
Likeability_i{_(}{_j}{_)} = \alpha + \beta_1 WomanMP_{j} + \beta_2 Emotion_{j} + \beta_3 Aggression_{j} + \nonumber \\
                            \beta_4 Statistics_{j} + \beta_5 Anecdote_{j} + \beta_6 (WomanMP \cdot Emotion) +  \nonumber \\
                            \beta_7 (WomanMP \cdot Aggression) + \beta_8 (WomanMP \cdot Statistics) +  \nonumber \\
                            \beta_9 (WomanMP \cdot Anecdote) + \gamma X_i{_(}{_j}{_)} + \epsilon_{i} \nonumber \\
\end{eqnarray}
\noindent where $\beta_1$ describes the difference in likeability evaluations between women and men MPs in the control condition. $\beta_2$--$\beta_5$ describe the effect of each style type on MP likeability evaluations compared to the control condition $\alpha$. $\beta_6$--$\beta_9$ describe the difference in the effect of each style type on likeability evaluations for women MPs compared to men. My argument is that women MPs will be rewarded for expressing styles that are congruent with female stereotypes, while they will be punished for expressing female stereotype incongruent styles. As such, I expect that the coefficients for $\beta_6$ and $\beta_9$ to be positive, and the coefficients for $\beta_7$ and $\beta_8$ to be negative. $X_i$ represents a control for each policy area (transport, housing, and health). As there are multiple observations per respondent, I cluster standard errors at the respondent level $i$. 

I also estimate the same model for the competence outcome: 
\begin{eqnarray}\label{eq:conditional_competence_pooled}
Competence_i{_(}{_j}{_)} = \alpha + \beta_1 WomanMP_{j} + \beta_2 Emotion_{j} + \beta_3 Aggression_{j} + \nonumber \\
                            \beta_4 Statistics_{j} + \beta_5 Anecdote_{j} + \beta_6 (WomanMP \cdot Emotion) +  \nonumber \\
                            \beta_7 (WomanMP \cdot Aggression) + \beta_8 (WomanMP \cdot Statistics) +  \nonumber \\
                            \beta_9 (WomanMP \cdot Anecdote) + \gamma X_i{_(}{_j}{_)} + \epsilon_{i} 
\end{eqnarray}
where $\beta_1$--$\beta_9$ represent the same quantities as above. Again my argument is that women MPs will be rewarded for stereotype-congruent behaviour and punished for stereotypes-incongruent behaviour. As such, I expect that $\beta_7$ will be negative, and $\beta_6$, $\beta_8$, and $\beta_9$ will be positive. $X_i$ again represents a control for each policy area (transport, housing, and health). I cluster standard errors at the respondent level $i$. The results from both models are presented in table \ref{tab:pooled_conditional}. 

```{r, results = 'asis'}

load("analysis/tables/conditional_pooled.Rdata")

stargazer(pooled_likeability, pooled_competence,
          covariate.labels = c("Intercept", "Woman MP", "Emotion", "Aggression", "Statistics", "Anecdote", 
                               "Housing Policy", "Health Policy", "Woman MP*Emotion", "Woman MP*Aggression", 
                               "Woman MP*Statistics", "Woman MP*Anecdote"), 
          dep.var.labels = c("Likeability", "Competence"), 
          intercept.bottom = FALSE, 
          header=FALSE, 
          no.space = TRUE, 
          keep.stat = c("n", "rsq", "adj.rsq"), 
          font.size = "footnotesize", 
          se = my_clustered_errors, 
          label="tab:pooled_conditional",
          title = "Relationship between MP gender, style usage, likeability and competence evaluations")

```

Looking first at the left hand column of table \ref{tab:pooled_conditional}, which shows the results for the likeability outcome, I see that the coefficient for $\beta_1$ (Woman MP) is insignificant. This suggests there is no difference in likeability evaluations between men and women MPs in the control condition. $\beta_2$ (Emotion) and $\beta_4$ (Statistics) are also both insignificance, suggesting that among men, the use of emotional and statistical styles have no statistically significant effect on voters' evaluations of men's likeability relative to the control.  However, $\beta_3$ (Aggression) is negative and significant, suggesting that when men deliver aggressive arguments they are perceived as less likeable. Similarly, $\beta_5$ (Anecdote) is positive and significant, and voters therefore ascribe men higher likeability evaluations when using anecdotes compared to the control. The table also shows positive and significant coefficients for both housing and health policy areas, which suggests that voters find politicians to be more likeable in these issue areas compared to in transport, the baseline. 

Interestingly, the coefficients for each of the interaction terms -- $\beta_6$--$\beta_9$ in equation \ref{eq:conditional_likeability_pooled} described above -- are all statistically insignificance. The results from this analysis therefore support the analysis in the main body of the paper: that although style usage influences voters' evaluations of politicians' likeability, these evaluations are *not* gendered. 

Turning to the right hand column of table \ref{tab:pooled_conditional}, I present the results from equation \ref{eq:conditional_competence_pooled}. $\beta_1$ (Woman MP) is again insignificant, and therefore also appears to be no difference in competence evaluations between men and women MPs in the control condition. $\beta_2$ (Emotion) and $\beta_3$ (Aggression) are, however, both negative and significant, and voters therefore evaluate men as less competent when they use emotional arguments or aggressive arguments compared to the control. Both $\beta_4$ (Statistics) and $\beta_5$ (Anecdote) are insignificant. As in the likeability model, the coefficients for housing and health are both positive and significant, suggesting that voters also ascribe higher competence evaluations to politicians delivering arguments on these issue areas compared to transport. 

The coefficients for the each of the interaction terms -- $\beta_6$--$\beta_9$ in equation \ref{eq:conditional_competence_pooled} -- are again statistically insignificant. There is no evidence that women MPs in particular are punished in competence evaluations when they violate stereotypes, nor are they rewarded when they conform to stereotypes, compared to men. Overall, the results from both models support the findings presented in the main paper. 

\newpage

## MP gender and policy area

While not the main quantity of interest, it is possible that men and women MPs may receive differential evaluations in likeability and competence depending on the policy area in question. A rich body of literature has shown that women are stereotyped to be better suited to and more qualified on issues that relate to feminine communal stereotypes, while men are expected to instead have more authority on issues related to the masculine, agentic, and assertive stereotypes to which they are associated [@Huddy1993; @Kahn1996; @Lawless2004; @McDermott1997; @Schneider2019]. Other work has shown that women candidates are more successful when they run on "women's issues" [@Ennser-Jedenastik2017; @Herrnson2003]. Once in office, women politicians introduce and advocate for legislation on "feminine" social policy issues [@Schwindt-Bayer2006; @Swers2002; @Thomas1994], disproportionately participate in debates on women's issues [@Back2019a; @Catalano2009], and may raise traditional women's issues in parliamentary speeches more than men [@Bailer2021; @Hargrave2022]. Further, past experimental work has shown, for instance, that the policy context may have important implications for the power of gender stereotypes [@Anzia2022; @Holman2011; @Holman2016; @Holman2017]. Further, women have been said to be more persuasive on "feminine" policy areas and men on "masculine" policy areas, although recent experimental work finds little evidence to support this [@Anderson-Nilsson2021; @Searles2020]. 

In the experimental design described in the main body of the paper, I vary the policy area of the arguments, focusing on housing, health, and transport. While there is some debate in the literature on whether housing and transport policies are stereotypically "feminine" and "masculine" [@Krook2012], health is an area that has commonly been associated with "feminine" stereotypes of being communal and caring [@Catalano2009; @Kittilson2011; @Norris1996]. As such, it may be the case that women are evaluated as more likeable and competent than men on health as opposed to transport or housing policy issues. To assess whether this is the case, I estimate the following: 
\begin{eqnarray}
Likeability_i{_(}{_j}{_)} = \alpha + \beta_1 WomanMP_{j} + \beta_2 HousingPolicy_{j} + \beta_3 HealthPolicy_{j} + \nonumber \\
                           \beta_4 (WomanMP \cdot HousingPolicy) + \beta_5 (WomanMP \cdot HealthPolicy) + \gamma X_i{_(}{_j}{_)} + \epsilon_{i} 
\end{eqnarray}
\noindent where $\beta_1$ describes the difference in likeability between women and men MPs in the transport condition. $\beta_2$--$\beta_3$ describe the effect of housing and health policies on MP likeability evaluations compared to transport among men. $\beta_4$--$\beta_4$ describe the difference in the effect of housing and health policies compared to transport on likeability for women MPs compared to men. $X_i$ represents a control for each style (emotion, aggression, and evidence). As there are multiple observations per respondent, I cluster standard errors at the respondent level $i$. 

Similarly, I can estimate the same quantity for the competence outcomes:
\begin{eqnarray}
Competence_i{_(}{_j}{_)} = \alpha + \beta_1 WomanMP_{j} + \beta_2 HousingPolicy_{j} + \beta_3 HealthPolicy_{j} + \nonumber \\
                           \beta_4 (WomanMP \cdot HousingPolicy) + \beta_5 (WomanMP \cdot HealthPolicy) + \gamma X_i{_(}{_j}{_)} + \epsilon_{i} 
\end{eqnarray}
\noindent where $\beta_1$--$\beta_6$ describe the same quantities described above. I present the results from both models in table \ref{tab:policy_areas}. The coefficients that enable me to see whether voters are awarding differentially likeability and competence evaluations for men and women MPs are the coefficient for $\beta_1$ (Woman MP), $\beta_4$ (Woman MP\*Housing Policy), and $\beta_5$ (Woman MP\*Health Policy). As is clear from the table, all three coefficients are statistically insignificant, and, consequently, I see no evidence of differentially evaluations for men and women according to policy area. 

```{r, results = 'asis'}

load("analysis/tables/policy.Rdata")

stargazer(likeability_policy, competence_policy, 
          dep.var.labels = c("Likeability", "Competence"), 
          covariate.labels = c("Intercept", "Woman MP", "Housing Policy", "Health Policy", 
                               "Aggression", "Evidence", "Woman MP*Housing Policy", "Woman MP*Health Policy"),
          intercept.bottom = FALSE, 
          header=FALSE, 
          no.space = TRUE, 
          keep.stat = c("n", "rsq", "adj.rsq"), 
          font.size = "footnotesize", 
          se = my_clustered_errors, 
          label="tab:policy_areas",
          title = "Relationship between MP gender, policy area, likeability and competence evaluations")

```


\newpage 

# Heterogeneous effects by voter gender 

How might (gendered) perceptions and evaluations of the styles politicians use vary by voter gender? In the pre-analysis plan, I stated I would carry out an exploratory analysis into how the treatment effects may differ by respondent characteristics. In this section, I carry out two sets of analysis to investigate how the effects may vary by voter gender. First, to assess whether men and women voters are equivalently sensitive to the styles that politicians use. Second, to assess whether men and women voters are differentially sensitive to the extent to which women politicians conform to stereotype-congruent behaviours. I carry out each below. 

## Conditional relationships by voter gender 

Here I am interested in identifying whether men and women voters' evaluations and perceptions are equivalently sensitive to the styles politicians use. That is, might certain kinds of voters' evaluations of the likeability or competence of politicians be more affected by the styles used? I assess whether this is the case by interacting voter gender and style prevalence for each of the outcomes. Figure \ref{fig:voter_conditional_models} shows the results for each outcome and style, and tables \ref{tab:conditional_respondent_emotion_1}--\ref{tab:conditional_respondent_evidence_1} show the full results. 

As described in the main body of the paper, there are few overall differences in how politicians' style usage influences the perceptions and evaluations of men and women voters. For emotion (table \ref{tab:conditional_respondent_emotion_1}) none of the interaction terms are significant, as is also the case for aggression (table \ref{tab:conditional_respondent_aggression_1}). However, for evidence (table \ref{tab:conditional_respondent_evidence_1}) there are some differences. While men do not find politicians' use of anecdotal evidence as less competent or evidence-based than statistical evidence, women voters do. Further, while the use of anecdotal evidence improves men voters' likeability assessments of politicians relative to the use of statistical evidence, this is not the case for women voters. Therefore, to the extent that there are differences in how men and women voters evaluate politicians' use of styles, the differences are concentrated among the evidence style type.  

```{r, results = 'asis'}

emotion_likeability_covariates <- lm(perceived_likeability ~ respondent_gender*objective_style, 
                                     data = emotion) 

emotion_competence_covariates <- lm(perceived_competence ~ respondent_gender*objective_style,
                                           data = emotion)

emotion_prevalence_covariates <- lm(perceived_emotion ~ respondent_gender*objective_style,
                                           data = emotion)

stargazer(emotion_likeability_covariates, emotion_competence_covariates, emotion_prevalence_covariates, 
          header=FALSE, 
          no.space=TRUE, keep.stat=c("n","rsq","adj.rsq"), 
          covariate.labels = c("Intercept", "Woman Voter", "Emotional Style", "Woman Voter*Emotional Style"), 
          dep.var.labels = c("Likeability", "Competence", "Emotion"), 
          intercept.bottom = FALSE,
          font.size="footnotesize", 
          label = "tab:conditional_respondent_emotion_1",
          title="Relationship between voter gender and likeability evaluations, competence evaluations, and emotion perceptions")
```

```{r, results = 'asis'}

aggression_likeability_covariates <- lm(perceived_likeability ~ respondent_gender*objective_style, 
                                     data = aggression) 

aggression_competence_covariates <- lm(perceived_competence ~ respondent_gender*objective_style,
                                           data = aggression)

aggression_prevalence_covariates <- lm(perceived_aggression ~ respondent_gender*objective_style,
                                           data = aggression)

stargazer(aggression_likeability_covariates, aggression_competence_covariates, aggression_prevalence_covariates, 
          header=FALSE, 
          no.space=TRUE, keep.stat=c("n","rsq","adj.rsq"), 
          covariate.labels = c("Intercept", "Woman Voter", "Aggressive Style", "Woman Voter*Aggressive Style"), 
          dep.var.labels = c("Likeability", "Competence", "Aggression"), 
          intercept.bottom = FALSE,
          font.size="footnotesize", 
          label = "tab:conditional_respondent_aggression_1",
          title="Relationship between voter gender and likeability evaluations, competence evaluations, and aggression perceptions")
```

```{r, results = 'asis'}

evidence_likeability_covariates <- lm(perceived_likeability ~ respondent_gender*objective_style, 
                                     data = evidence) 

evidence_competence_covariates <- lm(perceived_competence ~ respondent_gender*objective_style,
                                           data = evidence)

evidence_prevalence_covariates <- lm(perceived_evidence ~ respondent_gender*objective_style,
                                           data = evidence)

stargazer(evidence_likeability_covariates, evidence_competence_covariates, evidence_prevalence_covariates, 
          header=FALSE, 
          no.space=TRUE, keep.stat=c("n","rsq","adj.rsq"), 
          covariate.labels = c("Intercept", "Woman Voter", "Anecdote", "Woman Voter*Anecdote"), 
          dep.var.labels = c("Likeability", "Competence", "Evidence"), 
          intercept.bottom = FALSE,
          font.size="footnotesize", 
          label = "tab:conditional_respondent_evidence_1", 
          title="Relationship between voter gender and likeability evaluations, competence evaluations, and evidence perceptions")
```

\afterpage{
\blandscape

\begin{figure}
\parbox[c][\textwidth][s]{\linewidth}{%
\vfill
\begin{center}
\includegraphics{analysis/plots/figure_S5_voter_gender_conditional.pdf}
\caption{\textbf{Conditional relationship between voter gender, style treatment status, style perceptions, and MP likeability and competence evaluations.} The emotional style, non-aggressive style, and anecdotal evidence are female stereotype-congruent, and the non-emotional style, aggressive style, and statistical evidence are female stereotype-incongruent.}
\label{fig:voter_conditional_models}
\end{center}
\vfill
}
\end{figure}

\elandscape
}

\newpage

## Conditional relationships by voter and MP gender 

As described in the main body of the paper, I am also interested in identifying whether men and women voters are differentially sensitive to the extent to which politicians conform to or violate stereotype-congruent behaviours. Here, I assess whether this is the case by subsetting the data into men and women voters, and replicating the analysis from the main body of the paper. Tables \ref{tab:conditional_respondent_emotion}--\ref{tab:conditional_respondent_evidence} present the results from each of the outcomes arranged by the style types. 

While for aggression (table \ref{tab:conditional_respondent_aggression}) and evidence (table \ref{tab:conditional_respondent_evidence}) there do not seem to be any differences between men and women voters, this is not the case for emotion (table \ref{tab:conditional_respondent_emotion}). For likeability, among women voters, I see that the coefficient on the interaction term is positive and significant, suggesting that women politicians *in particular* are rewarded for expressing emotional styles instead of non-emotional styles. I do not find the same effect among men voters. For competence, the interaction term is again positive and significant, suggesting that while women voters find emotional politicians to be less competent than non-emotional politicians, they give women MPs *more* of a competency reward than men MPs. I again see no such effect among men voters. Finally, for perceived styles I again see that the interaction term is positive and significant. While women voters perceive both men and women politicians as more emotional when they use emotional styles than non-emotional styles, they perceive women MPs *in particular* to be more emotional than men MPs. I, again, do not find the same effect among men voters. 

Therefore, women voters give a bigger likeability and competence reward to women politicians who are emotional and perceive women MPs as more emotional than men MPs. I find no evidence of these effects among men respondents. Therefore, to the extent that there is any evidence of differential evaluations between men and women voters of men and women politicians for the styles they use, it seems that this effect is concentrated amongst women voters for the emotion style type. 


```{r, results = 'asis'}

emotion_men <- emotion[emotion$respondent_gender=="Man",]
emotion_women <- emotion[emotion$respondent_gender=="Woman",]

emotion_men_likeability <- lm(perceived_likeability ~ mp_gender*objective_style, data = emotion_men)
emotion_women_likeability <- lm(perceived_likeability ~ mp_gender*objective_style, data = emotion_women)
emotion_men_competence <- lm(perceived_competence ~ mp_gender*objective_style, data = emotion_men)
emotion_women_competence <- lm(perceived_competence ~ mp_gender*objective_style, data = emotion_women)
emotion_men_perceptions <- lm(perceived_emotion ~ mp_gender*objective_style, data = emotion_men)
emotion_women_perceptions <- lm(perceived_emotion ~ mp_gender*objective_style, data = emotion_women)

stargazer(emotion_men_likeability, emotion_women_likeability, emotion_men_competence, emotion_women_competence, emotion_men_perceptions, emotion_women_perceptions, 
          header = FALSE,
          intercept.bottom = FALSE, 
          no.space=TRUE, keep.stat=c("n","rsq","adj.rsq"),
          dep.var.labels = c("Likeability", "Competence", "Perceived emotion"), 
          column.labels = c("Men", "Women", "Men", "Women", "Men", "Women"), 
          covariate.labels = c("Intercept", "Woman MP", "Emotional Style", "Woman MP*Emotional Style"), 
          title = "Relationship between MP gender, voter gender, and likability, competence, and perceived emotion for the emotion style", 
          label = "tab:conditional_respondent_emotion", 
          font.size="footnotesize")

```


```{r, results = 'asis'}

aggression_men <- aggression[aggression$respondent_gender=="Man",]
aggression_women <- aggression[aggression$respondent_gender=="Woman",]

aggression_men_likeability <- lm(perceived_likeability ~ mp_gender*objective_style, data = aggression_men)
aggression_women_likeability <- lm(perceived_likeability ~ mp_gender*objective_style, data = aggression_women)
aggression_men_competence <- lm(perceived_competence ~ mp_gender*objective_style, data = aggression_men)
aggression_women_competence <- lm(perceived_competence ~ mp_gender*objective_style, data = aggression_women)
aggression_men_perceptions <- lm(perceived_aggression ~ mp_gender*objective_style, data = aggression_men)
aggression_women_perceptions <- lm(perceived_aggression ~ mp_gender*objective_style, data = aggression_women)

stargazer(aggression_men_likeability, aggression_women_likeability, aggression_men_competence, aggression_women_competence, aggression_men_perceptions, aggression_women_perceptions, 
          header = FALSE,
          intercept.bottom = FALSE, 
          no.space=TRUE, keep.stat=c("n","rsq","adj.rsq"),
          dep.var.labels = c("Likeability", "Competence", "Perceived aggression"), 
          column.labels = c("Men", "Women", "Men", "Women", "Men", "Women"), 
          covariate.labels = c("Intercept", "Woman MP", "Aggressive Style", "Woman MP*Aggressive Style"), 
          title = "Relationship between MP gender, voter gender, and likability, competence, and perceived aggression for the aggression style", 
          label = "tab:conditional_respondent_aggression", 
          font.size="footnotesize")

```

```{r, results = 'asis'}

evidence_men <- evidence[evidence$respondent_gender=="Man",]
evidence_women <- evidence[evidence$respondent_gender=="Woman",]

evidence_men_likeability <- lm(perceived_likeability ~ mp_gender*objective_style, data = evidence_men)
evidence_women_likeability <- lm(perceived_likeability ~ mp_gender*objective_style, data = evidence_women)
evidence_men_competence <- lm(perceived_competence ~ mp_gender*objective_style, data = evidence_men)
evidence_women_competence <- lm(perceived_competence ~ mp_gender*objective_style, data = evidence_women)
evidence_men_perceptions <- lm(perceived_evidence ~ mp_gender*objective_style, data = evidence_men)
evidence_women_perceptions <- lm(perceived_evidence ~ mp_gender*objective_style, data = evidence_women)

stargazer(evidence_men_likeability, evidence_women_likeability, evidence_men_competence, evidence_women_competence, evidence_men_perceptions, 
          evidence_women_perceptions, 
          header = FALSE,
          intercept.bottom = FALSE, 
          no.space=TRUE, keep.stat=c("n","rsq","adj.rsq"),
          dep.var.labels = c("Likeability", "Competence", "Perceived evidence"), 
          column.labels = c("Men", "Women", "Men", "Women", "Men", "Women"), 
          covariate.labels = c("Intercept", "Woman MP", "Anecdote", "Woman MP*Anecdote"), 
          title = "Relationship between MP gender, voter gender, and likability, competence, and perceived evidence for the evidence style", 
          label = "tab:conditional_respondent_evidence", 
          font.size="footnotesize")
```

\bigskip

I present the results from this analysis graphically in figure \ref{fig:voter_mp_conditional_models}. The rows represent the styles and the columns represent the outcomes. Each panel shows the four combinations of voter gender and MP gender, where men voters' evaluations of men politicians are displayed in the dark blue squares, men voters' evaluations of women politicians are displayed in the light blue circles, women voters' evaluations of men politicians are displayed in the triangular yellow points, and women voters' evaluations of women politicians are displayed in the maroon diamonds. 

\afterpage{
\blandscape

\begin{figure}
\parbox[c][\textwidth][s]{\linewidth}{%
\vfill
\begin{center}
\includegraphics{analysis/plots/figure_S6_mp_voter_gender_conditional.pdf}
\caption{\textbf{Conditional relationship between MP gender, voter gender, style treatment status, style perceptions, and MP likeability and competence evaluations.} The emotional style, non-aggressive style, and anecdotal evidence are female stereotype-congruent, and the non-emotional style, aggressive style, and statistical evidence are female stereotype-incongruent.}
\label{fig:voter_mp_conditional_models}
\end{center}
\vfill
}
\end{figure}

\elandscape
}



\newpage 

# Power analysis 

Figure \ref{fig:power_analysis} shows the results of a power analysis for the main effects outlined in the main paper. The power analysis was conducted after the experiment and, as such, the effect size remains fixed. To construct the power analysis, I simulated the data collection process for the fixed sample of 1,600 respondents I have available for different hypothetical standardised effect sizes for the interaction between style treatment and MP gender (from very small -- 0.01 standard deviations -- up to large -- 0.8 -- standard deviations according to conventional Cohen's $d$ standards). Note that here, to simplify the power analysis, I treat the styles as separate factorial designs, where I have a treatment and control condition and a binary moderator (MP gender) and I treat the outcome as a continuous variable, as opposed to a 5-point Likert scale. 

On the basis of an analysis with 1,600 respondents -- the fixed number I have available -- the power analysis suggests that I am well powered to detect a standardised interaction effect size of 0.28 with 80% power. While this is reasonably small by conventional Cohen's $d$ standards, it is also roughly comparable to the effects I estimate for the style treatments for men MPs across the various styles and outcomes. This suggests that I cannot confidently rule out the possibility of non-negligible interaction effects, but any interactions that do exist are nevertheless unlikely to be large relative to the overall variance in the outcome variables I study. 

\afterpage{

\vspace*{\fill}
\begin{figure}
\includegraphics{analysis/plots/figure_S7_power_analysis.pdf}
\caption{Power analysis}
\label{fig:power_analysis}
\end{figure}
\vfill
}

\newpage 

# References


